Data Visualization Project -- Investigation of the most influential factors for Opioid Crisis

Scott Lai

The prescription opioids use is popular be using in some specific area, such as cancer therapy, and pain relief. From 1999 to 2016, more than 630000 people have died from a drug overdose. As of 2016, 2.1 million Americans have an opioid use disorder, and more than 63600 people died from drug overdoses, which makes it the leading cause of injury-related death in the United States (CDC, 2018). I want to use the statistics to dig into the related data and come up with the idea to help end this crisis. Since the different states have different limitation and policy for the opioid use, I try to connect the opioid cause death rate data with all the county in the United States. From that, I can use multiple algorithms to dealing the data, including data visualization to show the distribution for the affection.

Methodology (Algorithm or analysis)

After gathering raw data, I first process the data that emerge different columns into one general data set, including the number of different races of people, different genders, income levels etc. in different counties. After sorting the data by different counties, I try to find out how different factors are related to the death number within Opioid death rate in different counties in each year. I will conduct a one-way ANOVA analysis in order to find out how different the races relating with the death numbers due to Opioid Crisis. As reports, from 2013 to 2016 is the most increasing year for the Opioid crisis, I'm interesting to find out the relations between each of them and the yearly change for the opioid crisis. The result be presented by showing a map of America, categorized by different county. Furthermore, I will use different colors to represent how close different factors are related to the death rate with linear regression plots. After showing the connecting of different factors to the death rates, I sum up the report by concluding the most influential factor to Opioid Crisis and which year was most effected by opioid using. By obtaining such conclusion, people will be more aware of such factor, hence will help prevent Opioid Crisis or reduce it in general.

Set Up

Downloading Packages

Dataset Import and Retrieving

As one of the county level project, this study contains seven dataset, which might hard to follow by each varible. Luckly, only one or two variables in each dataset will be using in this study.

The following cell is using the cleaning process including split counties & States name into two dataset, create new varible, and drop unuse variables.

overview

| Dataset                                     |   Columns |   Rows | description                                    |
|---------------------------------------------+-----------+--------+------------------------------------------------|
| The Opioid Cause of Death                   |      4970 |     12 | Death cause be due to use of opioids           |
| The Cancer Cause of Death                   |      2082 |      9 | Death cause by cancer                          |
| Unemployment                                |     12876 |      9 | Unemployment rate, population, unemployed, etc |
| Income and Poverty                          |     12569 |      5 | Income and Poverty Percent statistics          |
| Prescribing Rates                           |      3143 |      4 | Average of prescribing rate in U.S. counties   |
| Cartographic Boundary Shapefiles - States   |        56 |     10 | cartographic boundary for USA states           |
| Cartographic Boundary Shapefiles - Counties |      3233 |     10 | cartographic boundary for USA counties         |

That is a lot of data

Luckly we only pick only few piece from each of them.

Here are some of the variables retrieved:

  • OpioidDeathRate : Death rate cause by the opioid. OpioidDeathRate = Opioid Deaths Population / Total Population
  • UnemploymentRate : The unemployment Rate in each counties in United States.
  • CancerDeathRate : The death rate related with cancer.
  • prescribingRateAVE : The Average amount of opioids prescribed in the US from 2013 to 2016
  • ProvertyPre : All Ages of people in Poverty Percent base on median family income by family size from 2013 to 2016
  • geometry : The geometry information conduct for each counties in U.S.

Merge all the factors base on year from 2013 to 2016

Bese on some other research, 2017 is the worst year ever for drug overdose deaths in Americathe, and the high increasing time for Opioid Crisis is from 2013 to 2016. Therefore, in here we choice all the dataset and varible from 2013 to 2016.

In order to show the different between each factor, this study seperate dataset base on years. which each year contains County names, Opioid Death Rate for each year, State names, Prescribing Rate in each year, Cancer Death Rate in each year, and Proverty percent in each counties for each year.

In order to maping the exactly factor feature, I choice to keep the missing value at this point, and will drop the NaN latter when doing regression.

Create Plots.

GeoDataframe

Merge the geometry information with dataset for each year to mapping the visualazation plot on the map.

Here we use 2013 GeoDataframe as an example

Out[217]:
County OpioidDeathRate2013 State PrescribingRate2013 CancerDeathRate2013 ProvertyPre geometry
0 Alachua 0.098128 FL 41.295902 0.064920 NaN POLYGON ((-82.658554 29.830144, -82.651494 29....
1 Alachua 0.386399 FL 77.000000 0.051799 NaN POLYGON ((-82.658554 29.830144, -82.651494 29....
2 Alachua 0.386399 FL 76.200000 0.065445 25.7 POLYGON ((-82.658554 29.830144, -82.651494 29....
3 Bay 0.141877 FL 41.491803 0.063650 NaN POLYGON ((-85.99471199999999 30.311702, -85.99...
4 Bay 0.193469 FL 41.687705 0.062380 NaN POLYGON ((-85.99471199999999 30.311702, -85.99...
Out[175]:
County                    0
OpioidDeathRate2013    6054
State                     0
PrescribingRate2013    4697
CancerDeathRate2013    6641
ProvertyPre            4353
geometry                  0
dtype: int64

The table above show the missing value for each factors in the table. the merge causes a lot of missing value since some counties did not report any factors information, which shows as NaN, I did not drop or replace the missing value since it will cause the visible issue when the plot, which hard to find the distribution on the map. ( If I replace NaN by 0, the whole missing value would be plot as some light color close to what I want to see since the rate is mostly at a low level close to 0, the 0 joins would make it hard to distinguish.) So I'd rather keep the missing value as empty space.

Plot the geometry distribution

the plots below are the distribution map in county level for four variables from 2013 to 2016

The plot above shows the Opioid Death rate, the cancer death rate, the prescribing rate and the poverty percent in 2013 to 2016.

What we can find out from the plot is most Opioid death happened in the counties with the high cancer death rate, high prescribing rate, and high poverty percent. ( the deep colors all located at approximately the same area), we can find out the opioid death are mostly on the east coast and someplace on the west coast. This distribution is the same for all plots above in each year.

Horizontal Comparison

In order to find out the change during years change, we can make a plot conduct all the plot above, using year as x-axis and four factors as y-axis.

This is amazing! From above, we can analyze the factor for yearly change just by the plot of map.

From the color change from light to deep, we can notice the rates are keeping increasing from 2013 to 2016. Most death are happend in the east coast with large veriation with mostly spread out. But we can see from the plot, the most density area located in northeast, which are states like New York, Massachusetts, and New Jersey.

Also, from the graph, it's not to find out that the place with higher proverty rate share the higher prescribing rate, which is also related with both death rate.

From the plot, it is hard to find out the detials for the change or the relationship between, So I'm using ANOVA and regression to dig out the relationship between them.

ANOVA for category data (race)

                sum_sq      df          F        PR(>F)
C(Race)   9.597206e+04     3.0  15.264987  6.933151e-10
Residual  1.040720e+07  4966.0        NaN           NaN
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Deaths   R-squared:                       0.009
Model:                            OLS   Adj. R-squared:                  0.009
Method:                 Least Squares   F-statistic:                     15.26
Date:                Tue, 11 Dec 2018   Prob (F-statistic):           6.93e-10
Time:                        14:14:04   Log-Likelihood:                -26055.
No. Observations:                4970   AIC:                         5.212e+04
Df Residuals:                    4966   BIC:                         5.214e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
========================================================================================================
                                           coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------
Intercept                               13.5333      6.824      1.983      0.047       0.155      26.912
C(Race)[T.Asian or Pacific Islander]     2.7792     10.586      0.263      0.793     -17.974      23.532
C(Race)[T.Black or African American]    13.6464      7.134      1.913      0.056      -0.340      27.633
C(Race)[T.White]                        25.0362      6.859      3.650      0.000      11.589      38.483
==============================================================================
Omnibus:                     4109.935   Durbin-Watson:                   1.379
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           108649.892
Skew:                           3.890   Prob(JB):                         0.00
Kurtosis:                      24.544   Cond. No.                         30.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

This is the table that shows the output of the ANOVA analysis and whether there is a statistically significant difference between rece groups. We can see that the significance value is 6.933151e-10 (i.e., p = 6.933151e-10), which is below 0.05. and, therefore, there is a statistically significant difference in the Opioid death rate between the different race groups. This is great to know, but we do not know which of the specific groups differed. Luckily, we can find this out in the Multiple Comparisons table which contains the results of the race group regression.

From the OLS regression table, we can find out that the coeficient for Asian or Pacific Islander, Black or African American, and White, whcih shows Asian have the lowest coefficient, and the white have the highest. As the result, we can speculate most people suffer from Opioid crisis are white people, and Black communite are follow behind, then will be other groups like Asian and Pacific islander.

Plot the regression distribution

Form the previous mapping, we find out the Opioid death rates are possiblly related with factors such as Prescribing rate in each county, the Proverty rate, and the cancer death rate.

In order to find out the regression, I first dealing with the missing value in my data, I use interpolate() as my fitter to generate the missing data in my datasets.

Then, Plot the scatter plot to show the relations.

The plot above shows the relationship between opioid death rate and prescribing rate in each year from 2013 to 2016, we can find out that the prescribing rate is suddently increasing in 2015 (spread out in y), and as the result, the opioid death rate is increasing in teh 2016 (spread out in x).

We can ignore the density number like the bar in the graph, that's NaN generate provide the fitting value for NaN.

Same as before, ignore the bar shape data as NaN

We can see the Opoiod death rate related to cancer death rate and opioid death rate relationship from above. 2013 to 2016, the data spread out a lot compared with the beginning year in 2013 (the shape change from a bar shape in 2013 to round shape in 2016, which we can see the relationship between this two variable is more and more related. The more spread out data shows the approximately the linear regression between the two variables.

linear regression

To prove what we have from the plot above, we do the OLS linear regression to see if the result is right.

First we do the ols between Opioid Death Rate and Prescribing Rate for each year.

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2013   R-squared:                       0.006
Model:                             OLS   Adj. R-squared:                  0.005
Method:                  Least Squares   F-statistic:                     42.02
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):           9.61e-11
Time:                         14:32:11   Log-Likelihood:                 7193.9
No. Observations:                 7432   AIC:                        -1.438e+04
Df Residuals:                     7430   BIC:                        -1.437e+04
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.3803      0.003    126.871      0.000       0.374       0.386
PrescribingRate2013    -0.0002   3.24e-05     -6.482      0.000      -0.000      -0.000
==============================================================================
Omnibus:                     2012.003   Durbin-Watson:                   0.678
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           198058.250
Skew:                           0.116   Prob(JB):                         0.00
Kurtosis:                      28.289   Cond. No.                         260.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2013; the coef is -0.002 which is the negative relationship with a negative slope. The increasing of the prescribing rate would cause decreasing of the opioid death rate.

this means, 2013 looks not too bad for us, at least the prescription could save people from suffering.

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2014   R-squared:                       0.040
Model:                             OLS   Adj. R-squared:                  0.039
Method:                  Least Squares   F-statistic:                     310.1
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):           4.72e-68
Time:                         15:13:59   Log-Likelihood:                 4914.8
No. Observations:                 7524   AIC:                            -9826.
Df Residuals:                     7522   BIC:                            -9812.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.4136      0.003    119.638      0.000       0.407       0.420
PrescribingRate2014     0.0008   4.31e-05     17.610      0.000       0.001       0.001
==============================================================================
Omnibus:                     2309.592   Durbin-Watson:                   0.536
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            50736.684
Skew:                          -0.945   Prob(JB):                         0.00
Kurtosis:                      15.580   Cond. No.                         191.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2014; the coef is 0.0012 which is the positive relationship with a positive slope. The increasing of the prescribing rate would cause increasing of the opioid death rate.

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2015   R-squared:                       0.000
Model:                             OLS   Adj. R-squared:                  0.000
Method:                  Least Squares   F-statistic:                     1.257
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):              0.262
Time:                         14:57:50   Log-Likelihood:                 7108.8
No. Observations:                 7571   AIC:                        -1.421e+04
Df Residuals:                     7569   BIC:                        -1.420e+04
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2884      0.003    111.572      0.000       0.283       0.293
PrescribingRate2015  3.585e-05    3.2e-05      1.121      0.262   -2.68e-05    9.85e-05
==============================================================================
Omnibus:                     8602.334   Durbin-Watson:                   1.101
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          2303344.930
Skew:                           5.492   Prob(JB):                         0.00
Kurtosis:                      87.740   Cond. No.                         192.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2015, the coef is 1.105e-05 which is the positive relationship with positive slope. the increasing of the prescribing rate would cause increasing of the opioid death rate.

As one of the most horrible year for opioid crisis, this result is not suprise. From 2014 to 2015, the prescribtion of Opioid medicine affect the opioid crisis a lot.

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2016   R-squared:                       0.002
Model:                             OLS   Adj. R-squared:                  0.002
Method:                  Least Squares   F-statistic:                     17.80
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):           2.49e-05
Time:                         14:57:51   Log-Likelihood:                 4766.9
No. Observations:                 7835   AIC:                            -9530.
Df Residuals:                     7833   BIC:                            -9516.
Df Model:                            1                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2712      0.003     83.015      0.000       0.265       0.278
PrescribingRate2016    -0.0002   4.95e-05     -4.219      0.000      -0.000      -0.000
==============================================================================
Omnibus:                     8081.723   Durbin-Watson:                   0.929
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           568987.339
Skew:                           5.127   Prob(JB):                         0.00
Kurtosis:                      43.469   Cond. No.                         145.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2016, and the coef is -0.0007 which is the negative relationship with a negative slope. The increasing of the prescribing rate would cause decreasing of the opioid death rate.

We can see from the result, the coef change to negative, which might be relative with the limitation of prescribing limitation assign by the government in 2016, it works, which bring the problem back to normal.

regression plot

In order to show the relationship strightly, we make the graph below the show the relationship.

Out[200]:
Text(0.5, 1.0, 'Predicted Prescribing Rate probabilities from an OLS model')

As the result showing above, the relationship is fit for what we get from the table, the coef is decreaing in 2013 and 2016 and increasing in 2014 to 2015. We can find out that the 2014 is increasing a lot which is the year mostly affected by the prescribing.

same as above plot, we can also get the result for cancer death rates

Out[213]:
Text(0.5, 1.0, 'Predicted Cancer Death Rate probabilities from an OLS model')

From above show the reationship between opioid death rate and cancer death rate. which we can find out the 2014 share the high cancer death rate than other years.

The plot above show the opioid death rate related with all ages in prverty.

In order to show all other factors which might effect the opioid death rate, we do the linear regression for all factors.

The table below shows the linear regression between each factor, we can find out that all the coef are positive, which we can infer that the 2014 was a big year for opioid using, and cancer death rate was the most effective element for opioid, which have coef as 4.4823 which is higner than other factors.

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2013   R-squared:                       0.179
Model:                             OLS   Adj. R-squared:                  0.178
Method:                  Least Squares   F-statistic:                     538.3
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):          1.38e-316
Time:                         16:11:03   Log-Likelihood:                 7904.0
No. Observations:                 7432   AIC:                        -1.580e+04
Df Residuals:                     7428   BIC:                        -1.577e+04
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2200      0.005     44.812      0.000       0.210       0.230
PrescribingRate2013 -4.364e-05   3.04e-05     -1.437      0.151      -0.000    1.59e-05
CancerDeathRate2013     1.8366      0.052     35.243      0.000       1.734       1.939
ProvertyPre             0.0025      0.000     23.601      0.000       0.002       0.003
==============================================================================
Omnibus:                     4695.614   Durbin-Watson:                   0.851
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           510445.372
Skew:                           2.134   Prob(JB):                         0.00
Kurtosis:                      43.375   Cond. No.                     4.99e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.99e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2013   R-squared:                       0.179
Model:                             OLS   Adj. R-squared:                  0.178
Method:                  Least Squares   F-statistic:                     538.3
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):          1.38e-316
Time:                         15:58:30   Log-Likelihood:                 7904.0
No. Observations:                 7432   AIC:                        -1.580e+04
Df Residuals:                     7428   BIC:                        -1.577e+04
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2200      0.005     44.812      0.000       0.210       0.230
PrescribingRate2013 -4.364e-05   3.04e-05     -1.437      0.151      -0.000    1.59e-05
CancerDeathRate2013     1.8366      0.052     35.243      0.000       1.734       1.939
ProvertyPre             0.0025      0.000     23.601      0.000       0.002       0.003
==============================================================================
Omnibus:                     4695.614   Durbin-Watson:                   0.851
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           510445.372
Skew:                           2.134   Prob(JB):                         0.00
Kurtosis:                      43.375   Cond. No.                     4.99e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.99e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2015   R-squared:                       0.003
Model:                             OLS   Adj. R-squared:                  0.002
Method:                  Least Squares   F-statistic:                     7.037
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):           0.000101
Time:                         15:59:07   Log-Likelihood:                 7118.7
No. Observations:                 7571   AIC:                        -1.423e+04
Df Residuals:                     7567   BIC:                        -1.420e+04
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.2685      0.005     51.239      0.000       0.258       0.279
PrescribingRate2015  7.828e-05   3.47e-05      2.255      0.024    1.02e-05       0.000
CancerDeathRate2015     0.1978      0.045      4.392      0.000       0.110       0.286
ProvertyPre            -0.0004      0.000     -2.205      0.027      -0.001   -4.04e-05
==============================================================================
Omnibus:                     8807.659   Durbin-Watson:                   1.105
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          2428389.691
Skew:                           5.733   Prob(JB):                         0.00
Kurtosis:                      89.986   Cond. No.                     3.37e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.37e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
                             OLS Regression Results                            
===============================================================================
Dep. Variable:     OpioidDeathRate2016   R-squared:                       0.050
Model:                             OLS   Adj. R-squared:                  0.050
Method:                  Least Squares   F-statistic:                     137.3
Date:                 Tue, 11 Dec 2018   Prob (F-statistic):           1.06e-86
Time:                         15:59:31   Log-Likelihood:                 4958.9
No. Observations:                 7835   AIC:                            -9910.
Df Residuals:                     7831   BIC:                            -9882.
Df Model:                            3                                         
Covariance Type:             nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.3628      0.006     62.196      0.000       0.351       0.374
PrescribingRate2016    -0.0004   5.06e-05     -6.956      0.000      -0.000      -0.000
CancerDeathRate2016    -0.8718      0.060    -14.520      0.000      -0.990      -0.754
ProvertyPre         -2.444e-05      0.000     -0.103      0.918      -0.000       0.000
==============================================================================
Omnibus:                     7582.048   Durbin-Watson:                   0.981
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           517097.870
Skew:                           4.597   Prob(JB):                         0.00
Kurtosis:                      41.723   Cond. No.                     2.75e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.75e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Summary

Conclusion

As a result, we can see the opioid cause death rate with cancer death rate is mostly correlated, and 2014 have the most massive increasing rate in the opioid cause of death.

Prescription opioids can be used to treat moderate-to-severe pain and are often prescribed following surgery or injury, or for health conditions such as cancer. In recent years, there has been a dramatic increase in the acceptance and use of prescription opioids for the treatment of chronic, non-cancer pain, such as back pain or osteoarthritis, despite serious risks and the lack of evidence about their long-term effectiveness. As one of the most effective elements for the opioid cause of death, the government should control the prescription opioids with strict law to decreasing the opioid using from the beginning.

But sometimes the prescription opioid could be helpful, especially on the cancer treatment, that's the reason why the cancer death rate is mostly related to opioid cause death rate. As the way to figure this, I suggest people using some alternative medicines which might not as good as opioid medicines, but without addiction.

Surprisingly, I find the relation between poverty rate and opioid was changed during the years, the higher poverty rate cause the high opioid death rate in 2013, but from 2014, the higher poverty rate cause the decreasing of the opioid death rate. I did some search on the website, which the result show the pricing of opioid medicines was increasing serval times during 2013 to 2015, and that might the reason cause the poor people cannot afford the medication for cancer or opioid using which produce the high increase in both death rate. To solve this, I think the government needs to provide some law to limit opioid medicine pricing change. I know some medicine company produces opioid because the huge benefit behind it, with the regulation or restriction, the balance between profit and manufacturing would be a break so that the opioid could be replaced by something else which more useful for treatment.

Future Directions

The weaknesses of my model can be overcome. The important weakness of low sample size after merge can be remedied by waiting until more data becomes available as time passes.

The method in real life problem always have a lot of missing value, this mtheod was not doing good work on that which replace the missing value by some basic method such as mean and median. In a good way to solve this, I can apply linear regression between my variables, or using K-NN methed to generate the most fittble numbers for the missing value, which will lower the varience of the data.

Our analysis also fails to include the fact that the month or day likely has a strong effect on opioid cause of death, so implementing some sort of time series analysis would likely give me greater power in detecting differences in our variables of interest.

Other ways to improve the model include the use of cross-validation to see if a polynomial term of some variables could provide more predictive power without overfitting.

Other areas of interest for analysis would be to make comparisons between some big cities such as NYC or Miami city instead of just doing an analysis of the counties.

Reference

[1] Center of Disease Control and Prevention. (2018, December 5).Multiple Cause of Death, 1999-2017 Request. Retrieved December 5, 2018, from https://wonder.cdc.gov/mcd.html

[2] Center of Disease Control and Prevention. (2018, December 5).Prescription Opioid Data. Retrieved December 5, 2018, from https://www.cdc.gov/drugoverdose/data/prescribing.html

[3] Center of Disease Control and Prevention. (2018, December 5).Cancer Data and Statistics. Retrieved December 5, 2018, from https://www.cdc.gov/cancer/dcpc/data/index.html

[4] U.S. Census Bureau. (2018, December 5).Income and Poverty in the United States (2013-2016) Retrieved December 5, 2018, from https://www.census.gov/data/tables/2017/demo/income-poverty/p60-259.html

[5] U.S. Census Bureau. (2018, December 5).Cartographic Boundary Shapefiles - States. Retrieved December 5, 2018, from https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html

[6] U.S. Census Bureau. (2018, December 5).Cartographic Boundary Shapefiles - Counties. Retrieved December 5, 2018, from https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html